死锁案例二
文 | 杨一 on 运维
转 | 来源:公众号yangyidba
一、前言
死锁,其实是一个很有意思也很有挑战的技术问题,大概每个 DBA 都会在工作过程中遇见。关于死锁我会持续写一个系列的案例分析,希望能够对想了解死锁的朋友有所帮助。本文源于我们的生产案例:并发申请 gap 锁导致的死锁案例,与之前的 死锁案例一不同,本案例是因为RR模式下两个事务中的 sql 可以获取同一个 gap 锁,导致对方事务的 insert 相互等待,导致死锁的。
二、案例分析
2.1 测试环境准备
Percona server 5.6.24 事务隔离级别为 RR
CREATE TABLE `t4` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT ,
`kdt_id` int(11) unsigned NOT NULL ,
`admin_id` int(11) unsigned NOT NULL ,
`biz` varchar(20) NOT NULL DEFAULT '1' ,
`role_id` int(11) unsigned NOT NULL ,
`shop_id` int(11) unsigned NOT NULL DEFAULT '0' ,
`operator` varchar(20) NOT NULL DEFAULT '0' ,
`operator_id` int(11) NOT NULL DEFAULT '0' ,
`create_time` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
`update_time` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '更新时间',
PRIMARY KEY (`id`),
UNIQUE KEY `uniq_kid_aid_biz_rid` (`kdt_id`,`admin_id`,`role_id`,`biz`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULTCHARSET=utf8;
INSERT INTO `t4` (`id`, `kdt_id`, `admin_id`, `biz`, `role_id`, `shop_id`, `operator`, `operator_id`, `create_time`, `update_time`)
VALUES
(1,10,1,'retail',1,0,'0',0,'2017-05-09 15:55:26','2017-05-09 15:55:26'),
(2,20,1,'retail',1,0,'0',0,'2017-05-09 15:55:40','2017-05-09 15:55:40'),
(3,30,1,'retail',1,0,'0',0,'2017-05-09 15:55:55','2017-05-09 15:55:55'),
(4,40,1,'retail',1,0,'0',0,'2017-05-09 15:56:06','2017-05-09 15:56:06'),
(5,50,1,'retail',1,0,'0',0,'2017-05-09 15:56:16','2017-05-09 15:56:16');
2.2 本测试案例场景是两个事务删除不存的行,然后在 insert 记录
2.3 死锁日志
------------------------
LATEST DETECTED DEADLOCK
------------------------
2017-09-11 14:51:03 7f78eaf25700
*** (1) TRANSACTION:
TRANSACTION 462308535, ACTIVE 20 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 360, 2 row lock(s), undo log entries 1
MySQL thread id 3584515, OS thread handle 0x7f78ea5f5700, query id 780258123 localhost root update
insert into t4(`kdt_id`, `admin_id`, `biz`, `role_id`, `shop_id`, `operator`, `operator_id`, `create_time`, `update_time`)
VALUES('18', '2', 'retail', '2', '0', '0', '0', CURRENT_TIMESTAMP, CURRENT_TIMESTAMP)
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 225 page no 4 n bits 72 index `uniq_kid_aid_biz_rid` of table `test`.`t4` trx id 462308535 lock_mode X locks gap before rec insert intention waiting
*** (2) TRANSACTION:
TRANSACTION 462308534, ACTIVE 29 sec inserting, thread declared inside InnoDB 5000
mysql tables in use 1, locked 1
3 lock struct(s), heap size 360, 2 row lock(s), undo log entries 1
MySQL thread id 3584572, OS thread handle 0x7f78eaf25700, query id 780258153 localhost root update
INSERT INTO t4(`kdt_id`, `admin_id`, `biz`, `role_id`, `shop_id`, `operator`, `operator_id`, `create_time`, `update_time`)
VALUES ('15', '1', 'retail', '2', '0', '0', '0', CURRENT_TIMESTAMP, CURRENT_TIMESTAMP)
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 225 page no 4 n bits 72 index `uniq_kid_aid_biz_rid` of table `test`.`t4` trx id 462308534 lock_mode X locks gap before rec
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 225 page no 4 n bits 72 index `uniq_kid_aid_biz_rid` of table `test`.`t4` trx id 462308534 lock_mode X locks gap before rec insert intention waiting
*** WE ROLL BACK TRANSACTION (2)
2.4 死锁日志分析
首先根据《死锁案例一》 和《一个最不可思议的 MySQL 死锁分析》中强调 delete 不存在的记录是要加上 GAP 锁,事务日志中显示 Lock_mode X wait
1. T2 delete from t4 where kdt_id = 15 and admin_id = 1 and biz = 'retail' and role_id = '1'; 符合条件的记录不存在,导致 T2 先持有了(lock_mode X locks gap before rec) 锁住[(2,20,1,'retail',1,0)-(3,30,1,'retail',1,0)]的区间 ,防止符合条件的记录插入。
2. T1 的 delete 于 T2 的 delete 一样 同样申请了 (lock_mode X locks gap before rec) 锁住[(2,20,1,'retail',1,0)-(3,30,1,'retail',1,0)]的区间 。
It is also worth noting here that conflicting locks can be held on a gap by different transactions. For example, transaction A can hold a shared gap lock (gap S-lock) on a gap while transaction B holds an exclusive gap lock (gap X-lock) on the same gap. The reason conflicting gap locks are allowed is that if a record is purged from an index, the gap locks held on the record by different transactions must be merged.
3. T1 的insert 语句申请插入意向锁,但是插入意向锁和T2持有的X GAP (lock_mode X locks gap before rec) 冲突,故等待T2中的GAP 锁释放。
Gap locks in InnoDB are “purely inhibitive”, which means they only stop other transactions from inserting to the gap. They do not prevent different transactions from taking gap locks on the same gap. Thus, a gap X-lock has the same effect as a gap S-lock.
4. T2 的 insert 语句申请插入意向锁,但是插入意向锁和 T1 持有 X GAP (lock_mode X locks gap before rec) 冲突,故等待 T1 中的 GAP 锁释放。
T1(INSERT )等待T2(DELETE),T2(INSERT)等待T1(DELETE) 故而循环等待,出现死锁。有兴趣的读者朋友可以测试一下 delete 存在记录的场景。
2.6 如何解决呢
先select 检查一下看看是否存在,然后在删除。这里也存在两个或者多个会话并发执行同一个select where条件的,这里需要开发同学做处理。
使用insert into on deuplicate key语法不存在则插入,而不是先删除,再插入。
三、小结
RR 事务隔离级别和 GAP 锁是导致死锁的常见原因,但是业务逻辑设计不合理也会出发死锁,本文的案例通过修改业务逻辑最终将死锁解决。
扩展阅读
1. 漫谈死锁
2. 如何阅读死锁日志
3. 死锁案例一
-The End-
Vol.151
有赞技术团队
为 300 万商家,150 个行业,200 亿电商交易额
提供技术支持
微商城|零售|美业
微信公众号:有赞coder 微博:@有赞技术
技术博客:tech.youzan.com
The bigger the dream,
the more important the team.